bench: reset stateful benchmarks between timed calls#130
Draft
bench: reset stateful benchmarks between timed calls#130
Conversation
`WalletBalanceMine` duplicated `WalletBalanceClean` exactly. Both registrations called `WalletBalance(bench, /*set_dirty=*/false, /*add_mine=*/true)`. No runtime reproducer is needed here because the duplicate is visible in the registration code itself. Remove the duplicate registration so the balance benchmark list stays distinct.
`MempoolCheckEphemeralSpends` only filled `tx2.vin[0]` in a loop. That left the rest of the inputs with default prevouts and built the wrong package shape. Write each prevout to `vin[i]` instead and assert that the last child input spends the last parent output.
`setup()` in nanobench runs once per epoch, while an epoch can still execute the benchmark body multiple times. With the default multi-epoch benchmarking, that makes it easy to mistake per-sample setup for per-iteration setup and silently benchmark changing state inside one sample. Fail fast when `setup()` is combined with anything other than `epochIterations(1)`.
Add commented-out assertions ahead of the fixes in this stack. Each assertion can be uncommented on the pre-fix code to confirm the benchmark does not start each timed call from the same state. The follow-up commits uncomment these checks and make them pass.
`CHACHA20` and `FSCHACHA20POLY1305` kept one cipher object alive across timed calls. That advanced the stream position, and `FSChaCha20Poly1305` eventually crossed a rekey boundary. The benchmark now keeps the same reviewer check in place. A fresh zero-key `CHACHA20` stream starts with `0x76`, and a fresh zero-key `FSChaCha20Poly1305` encryption starts with `0x9f`. Those assertions fail on the old code on the second timed call. Rebuild the cipher in `setup()` for each timed call so the measured work starts from the same state.
`BenchLockedPool` carried allocator state across timed calls. That changed the work performed after the first call. The benchmark keeps the same reviewer check in place. The initial seed must still be `INITIAL_STATE`, and that assertion fails on the old code on the second timed call. Rebuild the arena in `setup()` so each timed call starts from the same allocator state.
The pool map benchmarks cleared their maps inside the timed call but reused the grown bucket state. That changed the insertion preconditions for later iterations. The benchmark keeps the same reviewer check in place. It records the reset bucket count and asserts that each timed call starts from that same bucket layout. That fails on the old code on the second timed call. Reset the maps in `setup()` with `rehash(0)` so each timed call starts from the same empty bucket layout.
`WriteBlockBench` kept appending to one block manager across timed calls.
That changed the write position for later iterations.
The benchmark keeps the same reviewer check in place.
A fresh block manager should write to `FlatFilePos{0, STORAGE_HEADER_BYTES}`, and that assertion fails on the old code on the second timed call.
Rebuild the testing setup in `setup()` so each timed call writes into the same fresh block storage state.
`RollingBloom` kept one filter alive across timed calls. That changed the filter state and the later work it performed. The benchmark keeps the same reviewer check in place. A fresh timed call should not already contain the first inserted value, and that assertion fails on the old code on the second timed call. Reset the filter in `setup()` so each timed call starts from the same empty state.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This stack fixes several
bench_bitcoinbenchmarks that carried mutable state across timed calls.Later timed calls could start from meaningfully different state than the first one and end up measuring different work.
To make that easier to review, the stack first adds commented reviewer checks.
Those checks can be uncommented locally on the pre-fix code to show that the second timed call no longer starts from the same state as the first.
The follow-up commits then keep those checks in the same form and make them pass.
This stack:
nanobench::setup()against use with multi-iteration epochsThis stack intentionally does not try to remove every small source of variation.
It only fixes benchmarks where the carried state changed the work in a meaningful way.